-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expressify lower_bound
and upper_bound
in is_between
#1672
feat: expressify lower_bound
and upper_bound
in is_between
#1672
Conversation
def sum(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).sum() | ||
|
||
def mean(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).mean() | ||
|
||
def median(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).median() | ||
|
||
def max(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).max() | ||
|
||
def min(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).min() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by - it's redundat to define all of these in the CompliantExpr
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! I love to see net negative in PRs π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a minor comment on the unit test.
The approach used here to expressify is_between
seems generic enough to be used elsewhere, is that right?! I somehow thought it was going to be fairly more complex, yet extract_compliant
together with <>_and_extract_native
seem all we need ππΌπ
def sum(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).sum() | ||
|
||
def mean(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).mean() | ||
|
||
def median(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).median() | ||
|
||
def max(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).max() | ||
|
||
def min(self: Self, *column_names: str) -> ArrowExpr: | ||
return ArrowExpr.from_column_names( | ||
*column_names, backend_version=self._backend_version, version=self._version | ||
).min() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! I love to see net negative in PRs π
def test_is_between_expressified(constructor: Constructor) -> None: | ||
data = {"a": [1, 4, 2, 5], "b": [0, 5, 2, 4], "c": [9, 9, 9, 9]} | ||
df = nw.from_native(constructor(data)) | ||
result = df.select(nw.col("a").is_between(nw.col("b"), nw.col("c"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test with a generic expression instead of just col
?
Random proposal that would not change expected_dict
:
result = df.select(nw.col("a").is_between(nw.col("b"), nw.col("c"))) | |
result = df.select(nw.col("a").is_between(nw.col("b") * 0.9, nw.col("c") - 1)) |
On a second thought, we are creating some asymmetry between what we can pass to some_series.is_between(lower_bound_series, upper_bound_series) Apologies if this is already possible, if so we may want to add a test case for it |
thanks for your review! yup, this is what we do in |
What type of PR is this? (check all applicable
closes #1659
Related issues
Checklist
If you have comments or can explain your changes, please do so below